JudgeLM: Fine-tuned Large Language Models are Scalable Judges
https://arxiv.org/abs/2310.17631
JudgeLM obtains high agreement with the teacher judge, achieving an agreement exceeding 90% that even surpasses human-to-human agreement.
https://github.com/baaivision/JudgeLM